Apply Databricks Labs Repository Lockdown policy#19
Conversation
Implements the three-script workflow from the Databricks Labs Repository Lockdown policy: list-external-actions -> resolve-action-ref -> pin-gh-actions. - list-external-actions: emits every third-party action referenced under .github/ (requires yq by Mike Farah). - resolve-action-ref: for each action, finds the most recent release tag published before the cutoff (2026-03-10T00:00:00Z) and resolves it to a commit SHA. Handles both mono-repo conventions: subpath-prefixed tags (databrickslabs/sandbox/acceptance -> acceptance/v0.4.4) and top-level shared tags (github/codeql-action/analyze -> v4.32.6, where the subpath is just a directory inside a repo using a unified tag series). - pin-gh-actions: consumes resolve-action-ref output, rewrites every matching `uses:` under .github/ with the SHA form + tag comment, and stages (but does not commit) the result. Skips databricks/databrickslabs actions per policy. Deviates from the blueprint reference in one way: does not auto-create or switch branches, because GeoBrix manages branches manually. README documents the typical flow and the 2026-03-10 cutoff. Co-authored-by: Isaac
Every third-party `uses:` under .github/workflows/ and .github/actions/ is
now pinned to the commit SHA of the most recent release published before
2026-03-10T00:00:00Z, with the release tag preserved as an inline comment
for cross-reference (the comment is informational only — reviewers must
re-verify the SHA against the upstream release). Generated by running:
./scripts/security/list-external-actions \
| xargs ./scripts/security/resolve-action-ref \
| ./scripts/security/pin-gh-actions
Resolutions (all 15 external refs, ordered; every ref was on a mutable
tag prior to this change):
actions/cache@v4, v5 -> cdf6c1fa... # v5.0.3
actions/checkout@v5 -> de0fac2e... # v6.0.2 (major bump)
actions/deploy-pages@v4 -> d6db9016... # v4.0.5
actions/download-artifact@v5 -> 70fc10c6... # v8.0.0 (major bump)
actions/setup-java@v5 -> be666c2f... # v5.2.0
actions/setup-node@v4 -> 53b83947... # v6.3.0 (major bump)
actions/setup-python@v5 -> a309ff8b... # v6.2.0 (major bump)
actions/upload-artifact@v5 -> bbbca2dd... # v7.0.0 (major bump)
actions/upload-pages-artifact@v3-> 7b1f4a76... # v4.0.0 (major bump)
codecov/codecov-action@v5 -> 671740ac... # v5.5.2
github/codeql-action/*@v4 -> 0d579ffd... # v4.32.6
pypa/gh-action-pypi-publish@... -> ed0c5393... # v1.13.0
Major-version jumps are consistent with the policy ("latest release before
the cutoff") but carry breaking-change risk — reviewers should validate
each bump against the action's CHANGELOG before merge. In particular,
upload-artifact v4+ and download-artifact v4+ changed artifact immutability
semantics; the new versions may interact with the existing upload_artifacts
composite action in ways worth exercising under CI before unblocking.
Local composite action refs (./.github/actions/*) are unaffected —
they're first-party.
Co-authored-by: Isaac
…kflows Databricks Labs Repository Lockdown policy requires any workflow using a non-exempt secret (anything other than GITHUB_TOKEN or CODECOV_TOKEN) to run inside a single protected GitHub Environment. GeoBrix uses REPO_ACCESS_TOKEN (PAT fallback for private-repo checkout) across most workflows, so every job that calls actions/checkout with that token now sets `environment: runtime`. Changes: - Added `permissions: contents: read` at top level where missing (codeql-analysis, publish-maven, release) and removed stray top-level `id-token: write` from build_main / build_python / build_scala / build_scala_by_package / codecov-scala-parallel / codecov-upload (none of those jobs request OIDC tokens). - deploy-docs: moved `pages: write` and `id-token: write` from top level down to the deploy job only (least privilege). The build job keeps `environment: runtime` for its REPO_ACCESS_TOKEN checkout; the deploy job keeps its existing `environment: github-pages`. - doc-tests: added `environment: runtime` on all three (currently disabled) jobs that perform REPO_ACCESS_TOKEN checkouts, so they are compliant when re-enabled. - release.yml: changed `environment: release` -> `environment: runtime` to converge on the single protected env the policy expects. - release.yml + publish-maven.yml: DISABLED via `if: false` on their publish jobs with a banner comment explaining the policy context and how to re-enable. GeoBrix is not publishing to PyPI or GitHub Packages from Actions today; we will coordinate with Labs before re-enabling. Exempt secrets per policy (GITHUB_TOKEN, CODECOV_TOKEN) are untouched and do not require the protected environment. Co-authored-by: Isaac
Labs Repository Lockdown policy: every Dependabot ecosystem in the repo must apply a cooldown so we are not the first adopters of a just-released (possibly compromised) version. Applied `cooldown.default-days: 7` to both maven and pip ecosystems. The policy also excludes `github-actions` from Dependabot entirely — action SHAs are refreshed manually via scripts/security/pin-gh-actions so bumps are reviewed as part of the security workflow rather than as auto-opened PRs. Added a comment documenting the intentional absence. Co-authored-by: Isaac
Databricks Labs Repository Lockdown policy requires all build-time binary fetches to be integrity-verified and all base images to be pinned by digest so a compromised registry/mirror cannot silently swap bytes. Dockerfile changes: - Pinned `FROM ubuntu:24.04` to the multi-arch manifest-list digest `sha256:c4a8d5503dfb2a3eb8ab5f807da5bc69a85730fb49b5cfca2330194ebcc41c7b` (kept `# ubuntu:24.04` comment for human readability). - Hadoop 3.4.0 tarball: replaced `wget | tar` stream with download -> sha512sum -c -> extract, using the official HADOOP_SHA512 from downloads.apache.org/.sha512. - GDAL 3.11.4 tarball: same pattern with a locally-computed SHA-256. OSGeo only publishes MD5; we MD5-verified the upstream download (9f4fa4b3be48fb60d5dd76fecb11a5f6) then computed and pinned SHA-256. - Apache Maven 3.9.9: replaced the dynamic `.sha512` fetch (which reads the checksum from the same origin as the tarball and therefore provides no protection against origin compromise) with an in-Dockerfile pinned MAVEN_SHA512 ARG, cross-checked against archive.apache.org. scripts/util/install_hadoop.sh: - Not referenced by the build; kept as a manual mirror of the Dockerfile flow. Rewrote with `set -euo pipefail`, a pinned HADOOP_SHA512, and `sha512sum -c` verification. Made executable. Each checksum has a matching comment documenting the authoritative source and the requirement to bump it in lockstep with the underlying version. Co-authored-by: Isaac
|
Notes:
|
gueniai
left a comment
There was a problem hiding this comment.
-
Is it possible to also lock PDAL to a SHA?
-
The PR adds environment: runtime to the three disabled jobs (test-python-docs, validate-structure, test-scala-docs) — but those jobs still contain:
ref: ${{ github.event_name == 'workflow_run' && github.event.workflow_run.head_sha || github.sha }}
When triggered via workflow_run from a fork PR, github.event.workflow_run.head_sha is the fork's commit. When someone removes the if: false to
re-enable these jobs, they will check out attacker-controlled code in a job that now has REPO_ACCESS_TOKEN available — because this PR added the
environment gate.
Before this PR: disabled jobs had no environment, so re-enabling would give you fork code + no REPO_ACCESS_TOKEN.
After this PR: disabled jobs have environment: runtime, so re-enabling gives you fork code + REPO_ACCESS_TOKEN.
The environment: runtime addition is correct in principle, but without the origin guard it makes re-enablement more dangerous, not less. The guard that
must be added before removing if: false:
if: github.event.workflow_run.head_repository.full_name == github.repository
Consider adding this as a comment block (or as a disabled if: condition) directly in the file now, so whoever re-enables the jobs can't miss it.
… jobs Addresses review feedback on PR #19: 1. Pin PDAL 2.8.2 to commit SHA 736fa0a66af4bed7105dff5fa152edf26bbb8a3a. Tags are mutable; switch the pdal-builder stage from `git clone -b <tag>` to `git fetch --depth 1 origin <SHA>` + `git checkout FETCH_HEAD`. New ARG PDAL_SHA is documented alongside PDAL_VERSION with the bump procedure, matching the Hadoop/GDAL/Maven pattern. 2. Add a SECURITY banner above the `if: false` line on each disabled job in doc-tests.yml (test-python-docs, test-scala-docs, validate-structure). These jobs now bind environment: runtime (which scopes REPO_ACCESS_TOKEN); combined with the workflow_run trigger and head_sha checkout used by two of the three jobs, naively re-enabling would expose REPO_ACCESS_TOKEN to fork-controlled code. Banner prescribes the required origin guard: if: github.event.workflow_run.head_repository.full_name == github.repository Banner also added to test-scala-docs since copy-paste from siblings is the likely re-enable path. Co-authored-by: Isaac
|
Addressed in 51f428b. 1. PDAL pinned to commit SHA. Resolved tag 2. Origin-guard banner added to all three disabled jobs in if: github.event.workflow_run.head_repository.full_name == github.repositoryI added the banner to |
Required by the Databricks Labs Repository Lockdown policy. Combined with branch protection, the listed team must approve every PR before merge. Pattern matches sibling labs repos (ucx, blueprint, dqx): root-level CODEOWNERS with a single `*` rule pointing to the per-repo write team. Co-authored-by: Isaac
|
Added CODEOWNERS in b649702. Mirrors the sibling labs pattern ( |
Summary
Applies the Databricks Labs Repository Lockdown policy to GeoBrix ahead of the 2026-03-10 SHA-pinning cutoff. Scope is lockdown items 1, 3–6 (item 2, Hatch→uv, is N/A — GeoBrix is Scala/Maven + setuptools Python with no Hatch).
Five commits on top of
master:3ff670aAddscripts/security/action-pinning tooling (list-external-actions,resolve-action-ref,pin-gh-actions, README).514871bPin external GitHub Actions to commit SHAs (cutoff 2026-03-10). Everyuses: org/repo@<tag>in.github/workflows/and.github/actions/is rewritten to@<sha> # <tag>. Local first-partyuses: ./.github/actions/*refs are intentionally unchanged. Tooling is rerunnable.0fab757Permissions +environment: runtimehardening. Top-levelcontents: readadded where missing; stray top-levelid-token: writeremoved from jobs that never request OIDC. Every job usingREPO_ACCESS_TOKEN(the only non-exempt secret in use) now runs in the single protected environmentruntime.deploy-docsdropspages: write/id-token: writefrom top level — moved to thedeployjob only.release.yml'senvironment: releaserenamed →runtime.release.ymlandpublish-maven.ymldisabled viaif: falsewith banner comments and re-enable instructions (we are not publishing to PyPI / GitHub Packages from Actions today).7076d47Dependabot:cooldown.default-days: 7onmavenandpipecosystems;github-actionsecosystem intentionally absent (SHAs are refreshed manually viascripts/security/pin-gh-actions), documented in a comment.6bd5a0bDockerfile + install_hadoop.sh hardening.FROM ubuntu:24.04pinned by multi-arch manifest-list digestsha256:c4a8d5503dfb…41c7b. Hadoop 3.4.0 (pinned SHA-512 fromdownloads.apache.org), GDAL 3.11.4 (pinned SHA-256; upstream only ships MD5, so we MD5-verified the tarball then computed SHA-256 locally), and Maven 3.9.9 (pinned SHA-512; previously did a dynamic.sha512fetch from the same origin as the tarball → no protection against origin compromise).scripts/util/install_hadoop.sh(unreferenced manual helper) hardened withset -euo pipefail+ matching SHA-512 verification.Policy items — coverage map
Reviewer notes — breaking-ish changes to double-check
Some Actions were pinned at a newer major than the tag the repo was previously using (commit
514871b):actions/checkoutv5 → v6 (SHAde0fac2e…)actions/upload-artifactv5 → v7actions/download-artifactv5 → v8actions/setup-nodev4 → v6actions/setup-pythonv5 → v6actions/upload-pages-artifactv3 → v4The repo's workflows still accept
node20runtime and the public API shapes are unchanged, but please confirm with a green CI run.Operational prerequisites on the repo
Before merging:
runtime(Settings → Environments → New environment). No reviewers/wait-timer required initially — the environment binding itself is the gate forREPO_ACCESS_TOKENscoping.REPO_ACCESS_TOKENfrom repo-level secrets to theruntimeenvironment's secrets so it can only be read by jobs that bind to it.CODECOV_TOKENstays at the repo/org level (exempt secret — no environment needed).Test plan
runtimeenvironment exists andREPO_ACCESS_TOKENis scoped to itbuild mainrun (PR trigger path hitsupdate-doc-inventory+build, both gated byenvironment: runtime)deploy-docspreview run still builds (doesn't deploy on PRs)gbx:test:scala+gbx:test:pythonpass in Docker (no behavior change expected, but Dockerfile was rewritten around the Hadoop/GDAL/Maven fetch sections)scripts/security/list-external-actionsreturns an empty problem list (every external ref is a SHA with a tag comment)This pull request and its description were written by Isaac.